    Gradient-free Policy Architecture Search and Adaptation

    We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to the expert demonstration, and then mitigate the effect of domain-shift during deployment by adapting a policy demonstrated in a source domain to rewards obtained in a target environment. We show that our approach allows safer learning than baseline methods, offering a reduced cumulative crash metric over the agent's lifetime as it learns to drive in a realistic simulated environment.Comment: Accepted in Conference on Robot Learning, 201

    Beyond Invariance: Test-Time Label-Shift Adaptation for Distributions with "Spurious" Correlations

    Changes in the data distribution at test time can have deleterious effects on the performance of predictive models p(y∣x)p(y|x). We consider situations where there are additional meta-data labels (such as group labels), denoted by zz, that can account for such changes in the distribution. In particular, we assume that the prior distribution p(y,z)p(y, z), which models the dependence between the class label yy and the "nuisance" factors zz, may change across domains, either due to a change in the correlation between these terms, or a change in one of their marginals. However, we assume that the generative model for features p(x∣y,z)p(x|y,z) is invariant across domains. We note that this corresponds to an expanded version of the widely used "label shift" assumption, where the labels now also include the nuisance factors zz. Based on this observation, we propose a test-time label shift correction that adapts to changes in the joint distribution p(y,z)p(y, z) using EM applied to unlabeled samples from the target domain distribution, pt(x)p_t(x). Importantly, we are able to avoid fitting a generative model p(x∣y,z)p(x|y, z), and merely need to reweight the outputs of a discriminative model ps(y,z∣x)p_s(y, z|x) trained on the source distribution. We evaluate our method, which we call "Test-Time Label-Shift Adaptation" (TTLSA), on several standard image and text datasets, as well as the CheXpert chest X-ray dataset, and show that it improves performance over methods that target invariance to changes in the distribution, as well as baseline empirical risk minimization methods. Code for reproducing experiments is available at https://github.com/nalzok/test-time-label-shift .Comment: 24 pages, 7 figure

    LANISTR: Multimodal Learning from Structured and Unstructured Data

    Multimodal large-scale pretraining has shown impressive performance for unstructured data including language, image, audio, and video. However, a prevalent real-world scenario involves the combination of structured data types (tabular, time-series) with unstructured data which has so far been understudied. To bridge this gap, we propose LANISTR, an attention-based framework to learn from LANguage, Image, and STRuctured data. The core of LANISTR's methodology is rooted in \textit{masking-based} training applied across both unimodal and multimodal levels. In particular, we introduce a new similarity-based multimodal masking loss that enables it to learn cross-modal relations from large-scale multimodal data with missing modalities. On two real-world datastes, MIMIC-IV (healthcare) and Amazon Product Review (retail), LANISTR demonstrates remarkable absolute improvements of 6.6\% (AUROC) and up to 14\% (accuracy) when fine-tuned on 0.1\% and 0.01\% of labeled data, respectively, compared to the state-of-the-art alternatives. Notably, these improvements are observed even in the presence of considerable missingness ratios of 35.7\% and 99.8\%, in the respective datasets

    Generalized Zero- and Few-Shot Learning via Aligned Variational Autoencoders

    Many approaches in generalized zero-shot learning rely on cross-modal mapping between the image feature space and the class embedding space. As labeled images are expensive, one direction is to augment the dataset by generating either images or image features. However, the former misses fine-grained details and the latter requires learning a mapping associated with class embeddings. In this work, we take feature generation one step further and propose a model where a shared latent space of image features and class embeddings is learned by modality-specific aligned variational autoencoders. This leaves us with the required discriminative information about the image and classes in the latent features, on which we train a softmax classifier. The key to our approach is that we align the distributions learned from images and from side-information to construct latent features that contain the essential multi-modal information associated with unseen classes. We evaluate our learned latent features on several benchmark datasets, i.e. CUB, SUN, AWA1 and AWA2, and establish a new state of the art on generalized zero-shot as well as on few-shot learning. Moreover, our results on ImageNet with various zero-shot splits show that our latent features generalize well in large-scale settings.Comment: Accepted at CVPR 201

    ASPEST: Bridging the Gap Between Active Learning and Selective Prediction

    Selective prediction aims to learn a reliable model that abstains from making predictions when the model uncertainty is high. These predictions can then be deferred to a human expert for further evaluation. In many real-world scenarios, however, the distribution of test data is different from the training data. This results in more inaccurate predictions, necessitating increased human labeling, which is difficult and expensive in many scenarios. Active learning circumvents this difficulty by only querying the most informative examples and, in several cases, has been shown to lower the overall labeling effort. In this work, we bridge the gap between selective prediction and active learning, proposing a new learning paradigm called active selective prediction which learns to query more informative samples from the shifted target domain while increasing accuracy and coverage. For this new problem, we propose a simple but effective solution, ASPEST, that trains ensembles of model snapshots using self-training with their aggregated outputs as pseudo labels. Extensive experiments on several image, text and structured datasets with domain shifts demonstrate that active selective prediction can significantly outperform prior work on selective prediction and active learning (e.g. on the MNIST→\toSVHN benchmark with the labeling budget of 100, ASPEST improves the AUC metric from 79.36% to 88.84%) and achieves more optimal utilization of humans in the loop